Session 5

R Introduction Workshop

May 08, 2019

Refresher: (5 mins)

  1. Load crime.csv using read.table() or read.csv() and assign it to variable crime.

  2. What is the mean murder rate in the US according to crime data?

  3. Load the first sheet of titanic.xlsx using the Import Dataset button in RStudio.

  4. In total how many females perished in titanic?

Visualization with R

There are 3 main families of visualization functions:

Base R visualization

Basic plot syntax:
plot(x , y) x: vector for x axis, y: vector for y axis

See ?plot

x <- 1:10 
y <- 1:10
plot(x, y)

Base R visualization: Scatterplot with iris

plot(iris$Sepal.Width, iris$Sepal.Length)

Base R visualization: Histogram with iris

hist(iris$Sepal.Width)

Base R visualization: Using par() to plot multiple plots

par(mfrow=c(1,2))
plot(iris$Sepal.Width, iris$Sepal.Length)
hist(iris$Sepal.Width)

plot() vs ggplot()

A picture is worth a thousand words – when the picture is good

Add layers to ggplot()

And make it interactive with ggplotly()

ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics

Installation

# The easiest way to get ggplot2 is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just ggplot2:
install.packages("ggplot2")

# Don't forget to load tidyverse to your environment
library(tidyverse)

# Or just ggplot2
library(ggplot2)

Usage

  1. Start with ggplot(),
    • supply a dataset
    • and aesthetic mapping using aes().
  2. You can then add on layers such as:
    • Geom (geometric object) with various geom_ functions.
    • Scales with various scale_ or labs() and lims() functions.
    • Faceting specifications with facet_ functions
    • Coordinate systems with coord_ functions

Building a ggplot from scratch with iris

Step 0: Let’s remember the iris data

head(iris, 3)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Step 1. Define data and aesthetics with aes()

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p

Step 2. Define plot type with geom_

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point()

Step 3. Assign more aesthetics

Step 3.1 Add color

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species))

Step 3.2 Add color + size

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length))

Step 3.3 Add color + size + alpha (transparency)

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width))

Step 3.4 Add color + size + alpha + shape

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species))

Step 4. Customize legend

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width, shape=Species)) +
    guides( color=guide_legend(ncol = 3, byrow = TRUE), 
            size=guide_legend(ncol = 3, byrow = TRUE), 
            alpha=guide_legend(ncol = 3, byrow = TRUE))

Step 5: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width)) + 
    geom_smooth()

What will this give me?

Step 5: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(color=Species, size=Petal.Length, alpha=Petal.Width)) + 
    geom_smooth()

Ooops! What happened??

Step 5.1: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) + 
    geom_smooth()

Why did this work now?
Can you see the difference?

Step 5.2: Assign more geom: point + smooth

p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length))
p + geom_point(aes(size=Petal.Length, alpha=Petal.Width)) + geom_smooth(aes(color=Species))

What about this? What’s happening here?

Step 6: Facetting

Step 6.0: Create a toy dataset

Let’s generate a hypothetical iris with some added ecosystem type and precipitation data.

ecosys <- sample(c("Forest", "Riparian", "Urban"), size = 150, replace = T)
precp <- sample(c("Heavy", "Mild"), size = 150, replace = T)

iris2 <- cbind(iris, Ecosystem=ecosys, Precipitation=precp)
head(iris2)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Ecosystem
## 1          5.1         3.5          1.4         0.2  setosa    Forest
## 2          4.9         3.0          1.4         0.2  setosa     Urban
## 3          4.7         3.2          1.3         0.2  setosa  Riparian
## 4          4.6         3.1          1.5         0.2  setosa    Forest
## 5          5.0         3.6          1.4         0.2  setosa  Riparian
## 6          5.4         3.9          1.7         0.4  setosa     Urban
##   Precipitation
## 1         Heavy
## 2          Mild
## 3         Heavy
## 4          Mild
## 5          Mild
## 6         Heavy

Step 6: Facetting

Step 6.1: Facet iris2

Now, I would like to see how my previous graph changes for the different types of ecosystem and precipitation.

This was the graph :
- I am not using geom_smooth for now because I do not have enough data points for model prediction.
- Also I will remove the alpha aesthetic to make it easier for us to see.

p2 <- ggplot(data=iris2, aes(x=Sepal.Width, y=Sepal.Length, color=Species))
p2 <- p2 + geom_point(aes(size=Petal.Length)) # + geom_smooth() 
p2

Now I added facets!

p2 + facet_grid(Ecosystem ~ Precipitation) 

I can customize the facets very easily!

p2 + facet_grid( . ~ Precipitation) 

p2 + facet_grid(Ecosystem ~ .) 

p2 + facet_grid(Precipitation ~ .) 

You get the idea here right?

Step 6.2: Facet wages

You can use facet_wrap if you want to facet by just 1 variable but you want to organize them nicely.

wages <- read_csv("data/wages.csv")


wages$age_cat <- cut(wages$age, breaks = 10)
head(wages, 4)
## # A tibble: 4 x 7
##     earn height sex    race     ed   age age_cat    
##    <dbl>  <dbl> <chr>  <chr> <dbl> <dbl> <fct>      
## 1 79571.   73.9 male   white    16    49 (43.9,51.2]
## 2 96397.   66.2 female white    16    62 (58.5,65.8]
## 3 48711.   63.8 female white    16    33 (29.3,36.6]
## 4 80478.   63.2 female other    16    95 (87.7,95.1]
pw <- ggplot(wages, aes(x=height, y=earn)) +
      geom_point(aes(size=ed), alpha=0.5)
pw

pw + facet_wrap(~age_cat)

Or you can specify the rows and columns for the faceting

pw + facet_wrap(~age_cat, ncol=5)

Your turn (5 mins)

Plot the wages.csv data like the following